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Abstract 

_ 2 

Linear combinations of translations of a single Gaussian, e ^ , are 
shown to be dense in (R). Two algorithms for determining the coeffi- 
cients for the approximations are given, using orthogonal Hermite func- 
tions and least squares. Taking the Fourier transform of this result shows 
low-frequency trigonometric series are dense in with Gaussian weight 
function. 
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1 Linear combinations of Gaussians with a sin- 
gle variance are dense in 

(R) denotes the space of square integrable functions / : R ^ R with norm 
II/II2 := ^ 4 \f{x)fdx. We use f j g to mean \\f - g\\2 < e. The following 
result was announced in [4]. 

Theorem 1 For any f G L'^ (R) and any e > there exists t > and A/" G N 
and ttn G R such that 

N 



-{x-nty 



n=0 

Proof. Since the span of the Hermite functions is dense in (R) we have 
for some N 



SL a'" / 



Now use finite backward differences to approximate the derivatives. We have 
for some small t > 



e/2 



boe ^bi} 



-ix-tf 



b2h 



AT 1 n 

Efcn-E(-l) 
n=0 r ^.=0 



(2) 



This result may be surprising; it promises we can approximate to any degree 
of accuracy a function such as the fohowing characteristic function of an interval 



X[-ii -10] (^) 



for X e [-10,-11] 
otherwise 



with support far from the means of the Gaussians e~(^~^^)^ which are located 
in [0,oo) at the points x = nt. The graphs of these functions e~^^~^*^ are 
extremely simple geometrically, being Gaussians with the same variance. We 
only use the right translates, and they all shrink precipitously (exponentially) 
away from their means. 



^ a^e ^^^^ ^ characteristic function? 



Surely there is a gap in this sketchy little proof? 

No. We will, however, flesh out the details in section [2] The coefficients 
are explicitly calculated and the convergence carefully justified. But these 
details are elementary. We include them in the interest of appealing to a broader 
audience. 

Then is this merely another pathological curiosity from analysis? We prob- 
ably need impractically large values of N to approximate any interesting func- 
tions. 



2 



No, N need only be as large as the Hermite expansion demands. Certainly 
this particular approach depends on the convergence of the Hermite expansion, 
and for many applications Hermite series converge slower than other Fourier 
approximations-after all, Hermite series converge on all of R while, e.g., trigono- 
metric series focus on a bounded interval. Hermite expansions do have powerful 
convergence properties, though. For example, Hermite series converge uniformly 
on finite compact subsets whenever / is twice continuously differentiable (i.e., 

C^) and O (^e~^^^^ for some c > 1 as x ^ oo. Alternately if / has finitely many 

discontinuities but is still elsewhere and O the expansion again con- 

verges uniformly on any closed interval which avoids the discontinuities [15 , 
[16]:. If / is smooth and properly bounded, the Hermite series converges faster 
than algebraically [7]. 

Then is the method unstable? 

Yes, there are two serious drawbacks to using Theorem [T] 

1. Numerical differentiation is inherently unstable. Fortunately we are estimat- 
ing the derivatives of Gaussians, which are as smooth and bounded as we could 
hope, and so we have good control with an explicit error formula. It is true, 
though, that dividing by t^ for small t and large n will eventually lead to huge 
coefficients and round-off error. There are quite a few general techniques 
available in the literature for combatting round-off error in numerical differen- 
tiation. We review the well-known n-point difference formulas for derivatives in 
section |6l 

2. The surprising approximation is only possible because it is weaker than the 
typical convergence of a series in the mean. Unfortunately 

Theorem [l] requires recalculating all the each time TV is increased. Further, 
the an are not unique. The least squares best choice of are calculated in 
section [3) but this approach gives an ill-conditioned matrix. A different formula 
for the an is given in Theorem [3] which is more computationally efficient. 

Despite these drawbacks the result is worthy of note because of the new 
and unexpected opportunities which arise using an approximation method with 
such simple functions. In this vein, section [4] details an interesting corollary of 
Theorem [ij apply the Fourier transform to see that low-frequency trigonometric 
series are dense in (R) with Gaussian weight function. 

2 Calculating the coefficients with orthogonal 
functions 

In this section Theorem [3] gives an explicit formula for the coefficients of 
Theorem [1] Let's review the details of the Hermite-inspired expansion 
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claimed in the proof. The formula for these coefficients is 



:= — ^ f / (x) e^' ( e-^') dx. 



Be warned this is not precisely the standard Hermite expansion, but a simple 
adaptation to our particular requirements. Let's check this formula for the bn 
using the techniques of orthogonal functions. 

Remember the following properties of the Hermite polynomials Hn ([16], 
e.g.). Define Hn (x) := (—1)^ • "^^^ Hermite functions 

I hn {x) := --^^=Hn {x) e--'/2 : n G N I 

is a well-known basis of (R) and is orthonormal since 

jHm {x) Hn {x) e-'^'dx = n\2^V^Sm,n- (3) 

This means given any g ^ LP' (R) it is possible to write 

OO 2 / 

9{^)= Ecn^^Hnix)e-^ (4) 

(equality in the sense) where 

Cn := ; ' ^Jg (x) Hn (x) e'^'^^dx e R. 

The necessity of this formula for Cn can easily be checked by multiplying both 
sides of Q by Hn {x) integrating and applying (|3|. However, we want 

n=0 ^X 

SO apply this process to g {x) = /(x)e^^/^. But /(x)e^^/^ may not be 
integrable. If it is not, we must truncate it: / {x) ^^X[-m,m] (x) is for any 
M < OO and f ■ X\-m m] ^ / for a sufficiently large choice of M. Now we get 

^ ' e/3 

new Cn as follows 



/ (x) /\[_M,M] (^) = E / Hn (x) 



e 



^ _ 2 ^ d^ 

f {x)X[-M,M]i^) = T.Cn^==^{-lT Hn{x)e ^ = E^n-^e 
n=0 C^X 

where 

= / ,i ^ // (^) e'''^^X[-M,M] (^) i^n (^) e"^'/^ (x) dx 

= -r=^// (^) X[-M,M] (^) (^) 
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so we must have 



(5) 



Now the second step of the proof of Theorem [T] claims that the Gaussian's 
derivatives may be approximated by divided backward differences 



^E(-i) 

^ k=0 



k n 



-{x-kty 



in the (R) norm. We'h use the "big oh" notation: for a real function ^ the 
statement " ^ (t) = O (t) as t ^ " means there exist K > and S > such 
that |^(t)| <K\t\ for < |t| < S. 



Proposition 2 For each n G N and p G (0, oo) 



/ 



i/p 



dx^^ 



t 



-{x-ktf 



dx 



0{t). 



Proof. In Appendix |6] the pointwise formula is derived: 

t 



dj ^ 1 — , \}\>( 
■^9{x) = -Efe=o(-l) 



dx 



where all of the are between x and x -\- nt. Therefore the proposition holds 
with g (x) = since ^^^+^) [^f.) is integrable for each k. This is not perfectly 
obvious because we don't have explicit formulae for the ^k- But the tails of ^^^+^) 
vanish exponentially, the continuity of ^^"^+^) guarantees a finite maximum on 
the bounded interval between the tails, and — x\ < k\t\. ■ 

Continuing the derivation of the coefficients we now have for sufficiently 
small t 



e n=0 k=0 



-{x-kty 



N 

E 

k=0 



n—k 



-{x-kty 



(6) 



In the last equality we just switched the order of summation (see [9 , section 2.4 
for an overview of such tricks). Combining ([5| and (|6| we have 

Theorem 3 For any f ^ L'^ (M) and any e > there exist N e N and to > 
such that for any t 7^ with \t\ < to 



N 



-{x-nt)^ 



for some choice of an dependent on N and t. 
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// / (x) is integrable, then one choice of coefficients is 



(jl) 

n 



// / {x) e^^/^ is not integrable, replace f in the above formula with f ■ X[-m,m] 
where M is chosen large enough that \\f — f ■ X[-m,m] II2 < ^• 

Remark 4 The approximation in Theorem also holds on C [a, b] with the 
uniform norm since the Hermite expansion is uniformly convergent on [a, b] 
(see fT^ . IIEI) ctnd the finite difference formula^s error term from Appendix^ 
converges to uniformly as t ^ . The Stone- Weierstrass Theorem does not 
apply in this situation because linear combinations of Gaussians with a single 
variance do not form an algebra. 

Remark b As a consequence of Theorem^for any e > the closed linear span 
of |e~^^~^^^ : 8 G [0,e)| is (R). It is even sufficient to replace [0,e) with 

{i:i,ieN}n[o,e). 

Let's explore some concrete examples in applying Theorem |3] Choose an 
interesting function with discontinuities and some support negative: 



/(x):=(x-lf X[-i,2](^):={ 



^ for [-1,2] 
otherwise 



and observe graphically: 




/ {x) := {x - 1) X[-i,2] (^) Hermite series AT = 20 



Hermite = 40 



/ \ 



Theorem [s] 
AT = 20, t = .05 




Theorem [s] 
A/" = 20, t = .01 







J 










15 3 4 



Theorem [s] 
A/' = 40, t = .01 
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The Her mite approximation is slowed by discontinuities, but does converge. 
The next choice of / is continuous but not smooth. 



1.5- 1.5 




Theorem |3] Theorem [s] Theorem |3] 

AT = 10, t = .01 = 20, t = .05 N = 20,t = .01 



In section |6] we review a standard technique accelerating this convergence in 
t. In our experiments, though, we've found the Hermite expansion is generally 
the bottleneck, not the round-off error of the derivative approximations for . 




Hermite expansion Hermite expansion Hermite expansion 
N = 60 N = 100 N = 120 



We need about 120 terms before visual accuracy is achieved for this simple 
function. There is a host of methods in the literature for improving convergence 
of the Hermite expansion, but generally we have better success with functions 
that are smooth and bounded [7 . Our last examples in this section illustrate 
how convergence is faster for functions which are smooth and "clamped off". 
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meaning multiplied by {x — a) {x -\- a) X[-a,a] whether or not they are positive 
or symmetric. 




A. 



Hermite TV = 10 



Hermite TV = 25 



61 61 




Hermite TV = 10 Hermite TV = 25 



3 Calculating the coefficients with least squares 



-{x-nty 



N 

Theorem 1 promises any function can be approximated f (x) ^ ^ a^e" 

n=0 

Theorem 3 gives a formula for the coefficients but this formula is not unique, 
and in fact is not "best" according to the classical continuous least squares 
technique. 
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Least squares approximation Theorem |3| approximation 
N = 5,t = m N = 5,t = m 



In least squares we minimize the error function 



^2 (ao,...,aAr) := / / (x) - ^a^e ^^^^ dx 



by setting i-^ = for 7 = 0, and solving for the a^. These + 1 linear 
equations are called the normal equations. The matrix form of this system is 
MIj' = b where M is the matrix 



M is symmetric and invertible, so we can always solve for the a^. But these 
least squares matrices are notorious for being ill-conditioned when using non- 
orthogonal approximating functions. The Hilbert matrix is the archetypical 
example. The current application is no exception since the matrix entries are 
very similar for most choices of N and t, so round-off error is extreme. Choosing 
N = 7 instead of 5 in the graphed example above requires almost 300 significant 
digits. 



4 Low-frequency trig series are dense in with 
Gaussian weight 

For / G (R, C) define the norm 




and 




and 





Write f ^ g to mean ||/ - g\\^^^ < e. 



Theorem 6 For every f ^ L'^ (M, C) and e > there exists G N and to > 
such that for any t with \t\ < to 



N 



e,<^ n=0 



-intx 



for some choice of a^i ^ C dependent on N and t. 

Proof. We use the Fourier transform with convention 

1 



^[/] is) 



Jf {x) e-''^dx. 



^ is a hnear isometry of (R, C) with 



e 4« ^ 



^ [/ + r)] = e-''^'J=' [f (x)] and 
J^[g^h] = V2^J^[g]J^[h]. 

where * is convolution. 

Let f & L"^ and we now show /2 {x) := -^e~'^ * [/] [x) G l? . Notice 
g := T-^ [/] G L2 and 



Il/2|l2 = / 



1^^9{x-y)e y dy 



^TT TTT) TO 



2 ,-2r 



dyds 



= c 



to 



\9\' 



c\\g 



c\\g\\l = c| 



< oo 



for some c > 0. Here Wt [h] is the solution to the diffusion equation for time t 
and initial condition h. (The notation W refers to the Weierstrass transform.) 
The reason for the third equality in the previous calculation is that Wt maintains 
the integral of any positive initial condition h for all time t > [17 . 

Now approximate the real and imaginary parts of /2 with Theorem [3j Then 
we get 



N 



n—0 



an G C 



and applying gives 



V2 



^ n=0 



-ints 1 —5^/4 



Hence 



AT 



V2e,G n=0 



usm; 



ing the fact that e > e 



10 



This result is surprising, even in the context of this paper, because for in- 

N 

stance, series of the form ^ ane~**^^^^^^ for ah t and are not dense in 

and in fact only inhabit a 4-dimensional subspace of the infinite dimensional 
Hilbert space [3^. 

Corollary 7 On any finite interval [a, b] for any uj > the finite linear combi- 
nations of sine and cosine functions with frequency lower than uj are dense in 
L2([a,6],R). 

Proof. On [a, 6] the Gaussian is bounded and so the norms with or without 
weight function are equivalent. Apply Theorem [g] to / G L^([a,6],R) and 
choose t such that Nt < a; to get 

N 

f ^ ^ (an) cos (ntx) + Im (a^) sin (ntx) 

^ n=0 

where 



Applying Remark |5] to this result shows even discrete sets of positive fre- 
quencies that approach make the span of the corresponding sine and cosine 
functions equal toL^ ([a, 6] ,R). 

Finally, low-frequency cosines span the even functions: 

Proposition 8 On any finite interval [0, b] for any uj > the finite linear 
combinations of cosine functions with frequency lower than uj are dense in 
L2([0,6],R). 

Proof. Let f ^ L'^ ([0?^] extend it as an even function on [—6,6]. 

Now use the previous corollary to write 

N 

f ^ ^ cos {ntx) + bn sin (ntx) . 

^ n=0 

We'd like to conclude right now that the bn = or bn ^ 0, but that is not true. 
However, every function g on [—6, 6] may be written uniquely as a sum of even 
and odd functions 



9e {x) 



9e + Qo 

g{x)^g{-x) 
2 

g{x) - g{-x) 
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and so 
Therefore 

f = fe 



g ^ h => ge^ he. 



■ AT 

an COS {ntx) + bn sin (ntx) 

.n=0 



AT 

an COS {ntx) . 

n=0 



Beware this last result; it's not as strong as Fourier approximation. The 
coefficients for the sine functions calculated above may be large; the proposition 
merely promises the linear combination of the sine terms is small. Using least 
squares, however, will have vanishing sine coefficients. 



5 Origins and generalizations 

The mathematical inspiration for Theorem [l] comes from geometrical investiga- 
tions in infinite dimensional control theory. We noticed that function translation 
and vector translation in (R) do not commute. Specifically, "function trans- 
lation" is a fiow on the infinite dimensional vector space (R) given by the 
map F : (R) X R ^ (R) where Ft (/) (x) := f{x^t). "Vector transla- 
tion" in the direction of g e (R) is the fiow G : (R) x R ^ (R) where 
Gt (/) '= / + tg. Taking for example g {x) := and composing F and G we 
see Ft o Gt Gt o Ft since for / = 

Ft o Gt (/) (x) = te-^^+^)' while Gt o Ft (/) {x) = te'^" . 
Notice however the key fact 

as t ^ 



In finite dimensions the commutator quotient above gives the Lie bracket [X, Y] 
of the vector fields X and Y which generate the fiows F and G, respectively. A 
fundamental result in finite-dimensional control theory states that the reachable 
set via X and Y is given by the integral surface to the distribution made up 
of iterated Lie brackets starting from X and Y (Chow's Theorem, which is an 
interpretation of Frobenius' Foliation Theorem, see [13 , e.g.). The idea we are 
exploiting is that iterated Lie brackets for our fiows F and G will give successive 
derivatives of the Gaussian, whose span is dense in (R). Consequently, the 
reachable set via F and G from / = should be ah of (R). That is to 
say, sums of translates and multiples of one Gaussian (with fixed variance) can 
approximate any integrable function. 

Unfortunately this program doesn't automatically work on the infinite di- 
mensional vector space (R) since the function translation fiow is not gener- 
ated by a simple vector field on (R). So instead of studying vector fields, 
we consider fiows as primary. The fundamental results can be rewritten and 
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still hold in the general context of a metric space [3^. Then other functions 
besides g (x) = can be checked to be derivative generating and other flows 
may be used in place of translation. E.g., Fourier approximation is achieved 
using dilation F : (R, C) x R ^ (R, C) where Ft (/) [x) := f [e^x) and 
Gt{f) (x) := / (x) + te*^. This gives us a general tool for determining the 
density of various families of functions. 

Another opportunity for generalizing the results of this paper presents itself 
with the observation that Hermite expansions are valid for functions defined on 
C or R"^ and in spaces of tempered distributions; and divided differences works 
in all of these spaces as well. 

Note also that while the results of section [2] work for uniform approximations 
of continuous functions on finite intervals (Remark [4]), this is an open question 
for low-frequency trigonometric approximations. 

The results of this paper can be ported to the language of control theory 
where we can then conclude the system 

Ut = Ci {t) + C2(t)e"^' (7) 

is bang-bang controllable with controls of the form Ci,C2 : R^ { — 1,0,1}. 
Theoremjs] drives the initial condition / = to any state in under the 
system (ItF but may be nowhere near optimal for approximating a function 
such as e~*^^+^^) , since it uses only Gaussians e~*^^+*) with choices of 5 << 10. 

Finally, interpreting Theorem [T] in terms of signal analysis, we see a Gaussian 
filter is a universal synthesizer with arbitrarily short load time. Let G {x) := 
;^e~^ . A Gaussian filter is a linear time-invariant system represented by the 
operator 

W (/) {x) := {f^G){x) = ^ f f {y) e'^^-^^'dy. 

Notice if you feed W a Dirac delta distribution St (an ideal impulse at time 
X = t) you get W {5t) = G {x — t). Then Theorem [l] gives 

Corollary 9 For any f e L'^ (R) and any e > and any r > there exists 
t>0 and TV G N with tN < r such that 

( ^ 



=0 



for some choice of an G 



Feed a Gaussian filter a linear combination of impulses and we can syn- 
thesize any signal and arbitrarily small load time r. The design of physical 
approximations to an analog Gaussian filter are detailed in [6], pT] . 



6 Appendix: Approximating higher derivatives 

The results in this paper may be much improved with voluminous techniques 
available from numerical analysis. E.g., [8 gives an algorithm which speeds the 
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calculation of sums of Gaussians, and [10] explores Hermite expansion accel- 
eration useful in step 1 of the proof of Theorem [l] This section is devoted to 
reviewing methods which improve the error in step 2, approximating derivatives 
of the Gaussian with finite differences. We also derive the error formula used in 
Proposition |2j 

Above we approximated derivatives with the formula 



t 



lELoi-^r'lDfi^^kt) , o{t) 



gives round-off error as t ^ 0+ 



truncation error 



• (8) 



The Norlund-Rice integral may be of interest for extremely large n as it avoids 
the calculation of the binomial coefficient by evaluating a complex integral. 
In this section, though, we devote our attention to deriving n-point formulas; 
these formulas decrease round-off error by increasing the number of evaluations 
f {x -\- kt)-th.is shrinks the truncation error without sending t ^ 0. 
In approximating the kth derivative with an n + 1 point formula 



/(^) {x) 



I n 



we wish to calculate the coefficients q. In the forward difference method, the 
ki = but keeping these values general allows us to find the coefficients for the 
central or backward difference formulas just as easily. The following method for 
finding the q was shown to us by our student Jeffrey Thornton who rediscovered 
the formula. 

Taylor's Theorem has 



/ (x + kit) 



(^«^) ^ y(n+l) 



J! - ' (n + 1)! 
for some between x and x -\- kit. From this it follows 



T.Cif{x- 

i=0 



kit) 



tf (x) 
(x) 



1 

h 



ki 



U6) 



(n+1)! 



(n+1)! 



1 

2! 



n! 



(n+i; 



Co 

Cl 
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Now pick c = [q] as a solution to 



1 


1 


1 


ko 


ki • 


kn 


2! 


kl 

2! 


2! 


kg 

n\ 


k- 
n\ 


K 

n\ 



Co 

Cl 







(9) 



which is possible since the ki are different, so the matrix is invertible, as is seen 
using the Vandermonde determinant 



n. {kj - ki) 



det 



0<i<j<n 



n i\ 

2<i<n 



Then we must have 



Y^Cif {x ^kit) 



fix) 
tf'ix) 



+n+l 



in +1)1- 



1 (fc-th position) 



n 



Therefore 



1 n 

/^'^ (^) = Tfc Ec^/ + + Error 



for Q which satisfy ([9| where 



Error = 



^n-\-l — k 



This Error formula shows how truncation error may be decreased by increas- 
ing n without shrinking t, thus combatting round-off error at the expense of 
increased computation of sums. 

The coefficients in ([8| are obtained by solving M for the q with ki chosen 
as ki z. 

Thornton also points out that the ki may be chosen as complex values when / 
is analytic (as is the case with our Gaussians). This gives us another opportunity 
to mitigate round-off error, since a greater quantity of regularly-spaced nodes 
ki can be packed into an epsilon ball around zero in the complex plane than on 
the real line. 
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As final note we mention tiiere iiave been numerous advances to tiie present 
day in inverting tiie Vandermonde matrix. We mention only the earliest appli- 
cation to numerical differentiation [14] which gives a formula in terms of the 
Stirling numbers. 
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